最近的自我监督进展表明,预先训练大量无监督数据的大型神经网络可能导致下游任务的概括令人印象深刻。这些模型最近被作为基础模型,一直转变为自然语言处理领域。虽然类似的模型也在大型图像的核心训练中,但它们不适合遥感数据。为刺激地球监测基础模型的发展,我们建议开发由与气候变化相关的各种下游任务组成的新基准。我们认为,这可能导致许多现有应用程序的大量改进,并促进新应用的发展。该提案还可以提出合作,并提出更好的评估过程,以减轻地球监测的基础模型的潜在缺陷。
translated by 谷歌翻译
Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to specify the output image using a text prompt. Inspired by the success of those models, and led by the notion that language was already developed to describe the elements of visual contexts that humans find most important, we introduce an embedding model closely related to a vision-language model. Specifically, we introduce the embedding model S-MAGMA: a 13 billion parameter multimodal decoder combining components from an autoregressive vision-language model MAGMA and biases finetuned for semantic search.
translated by 谷歌翻译
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.
translated by 谷歌翻译
在过去几年中,水下车辆操纵器系统(UVMS)变得越来越小,越来越小,在计划和控制系统时,考虑操纵器和车辆之间的耦合力变得越来越重要。但是,处理这些力的典型方法需要媒介物的精确流体动力模型,并在操纵器上使用低级扭矩控制,这两者在现场都很少见。因此,许多UVMS控制方法都是基于运动学的,无法固有地解释这些效果。我们的工作通过训练模拟UVMS数据上的复发性神经网络来弥合运动学控制与动态之间的差距,以根据系统以前的状态预测将来车辆的音高。运动学计划者和控制者可以使用此指标来合并动态知识,而无需计算昂贵的模型,从而提高了他们执行水下操纵任务的能力。
translated by 谷歌翻译
从具有高隐私要求的领域(例如医疗干预空间)获得的真实数据较低,并且收购在法律上很复杂。因此,这项工作提供了一种以医疗服装为例为医疗环境创建合成数据集的方法。目的是缩小合成数据和真实数据之间的现实差距。为此,使用虚幻的引擎插件或Unity比较了3D扫描服装和设计服装的方法。此外,还使用了绿屏和目标域数据集的混合现实数据集。我们的实验表明,设计服装的结构性域随机化以及混合现实数据提供了基线,可在临床目标域的测试数据集上实现72.0%的地图。当使用15%可用的目标域列车数据时,针对100%(660张图像)目标域列车数据的差距几乎可以关闭80.05%的地图(81.95%地图)。最后,我们表明,当使用100%目标域训练数据时,精度可以提高到83.35%的地图。
translated by 谷歌翻译
注释滥用语言很昂贵,在逻辑上复杂,并造成了心理伤害的风险。但是,大多数机器学习研究都优先提高有效性(即F1或精度得分),而不是数据效率(即,最小化注释的数据量)。在本文中,我们在两个数据集上使用模拟实验,以不同比例的滥用,以证明基于变形金刚的主动学习是一种有前途的方法,可以实质上提高效率,同时仍然保持高效,尤其是当虐待内容是数据集中较小比例的情况下。这种方法需要大量的标记数据,以达到与完整数据集培训相等的性能。
translated by 谷歌翻译
来自不同摄像头设备的光学相干断层扫描(OCT)成像会导致挑战域的变化,并可能导致机器学习模型的精度严重下降。在这项工作中,我们引入了基于单数值分解(SVDNA)的最小噪声适应方法,以克服视网膜OCT成像中三个不同设备制造商的目标域之间的域间隙。我们的方法利用噪声结构的差异成功地弥合了不同OCT设备之间的域间隙,并将样式从未标记的目标域图像转移到可用手动注释的源图像。我们演示了该方法尽管简单,但如何比较甚至胜过最先进的无监督域适应方法,用于在公共OCT数据集中进行语义细分。 SVDNA可以将仅几行代码集成到任何网络的增强管道中,这些网络与许多最新的域适应方法形成鲜明对比,这些方法通常需要更改基础模型体系结构或训练单独的样式转移模型。 SVDNA的完整代码实现可在https://github.com/valentinkoch/svdna上获得。
translated by 谷歌翻译
尖峰神经网络(SNN)为时间信号处理提供了有效的计算机制,尤其是与低功率SNN推理相结合时。历史上很难配置SNN,缺乏为任意任务寻找解决方案的一般方法。近年来,逐渐发芽的优化方法已应用于SNN,并且越来越轻松。因此,SNN和SNN推理处理器为在没有云依赖性的能源约束环境中为商业低功率信号处理提供了一个良好的平台。但是,迄今为止,行业中的ML工程师无法访问这些方法,需要研究生级培训才能成功配置单个SNN应用程序。在这里,我们演示了一条方便的高级管道,用于设计,训练和部署任意的时间信号处理应用程序,向子-MW SNN推理硬件。我们使用用于时间信号处理的新型直接SNN体系结构,使用突触时间常数的金字塔在一系列时间尺度上提取信号特征。我们在环境音频分类任务上演示了这种体系结构,该任务部署在流式传输模式下的Xylo SNN推理处理器上。我们的应用以低功率(<4MUW推理功率)达到了高准确性(98%)和低潜伏期(100ms)。我们的方法使培训和部署SNN应用程序可用于具有通用NN背景的ML工程师,而无需先前的Spiking NNS经验。我们打算将神经形态硬件和SNN成为商业低功率和边缘信号处理应用程序的吸引人选择。
translated by 谷歌翻译